Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity
نویسنده
چکیده
The persistent modification of synaptic efficacy as a function of the relative timing of pre- and postsynaptic spikes is a phenomenon known as spike-timing-dependent plasticity (STDP). Here we show that the modulation of STDP by a global reward signal leads to reinforcement learning. We first derive analytically learning rules involving reward-modulated spike-timing-dependent synaptic and intrinsic plasticity, by applying a reinforcement learning algorithm to the stochastic spike response model of spiking neurons. These rules have several features common to plasticity mechanisms experimentally found in the brain. We then demonstrate in simulations of networks of integrate-and-fire neurons the efficacy of two simple learning rules involving modulated STDP. One rule is a direct extension of the standard STDP model (modulated STDP), and the other one involves an eligibility trace stored at each synapse that keeps a decaying memory of the relationships between the recent pairs of pre- and postsynaptic spike pairs (modulated STDP with eligibility trace). This latter rule permits learning even if the reward signal is delayed. The proposed rules are able to solve the XOR problem with both rate coded and temporally coded input and to learn a target output firing-rate pattern. These learning rules are biologically plausible, may be used for training generic artificial spiking neural networks, regardless of the neural model used, and suggest the experimental investigation in animals of the existence of reward-modulated STDP.
منابع مشابه
Spike timing dependent plasticity: mechanisms, significance, and controversies
Long-term modification of synaptic strength is one of the basic mechanisms of memory formation and activity-dependent refinement of neural circuits. This idea was purposed by Hebb to provide a basis for the formation of a cell assembly. Repetitive correlated activity of pre-synaptic and post-synaptic neurons can induce long-lasting synaptic strength modification, the direction and extent of whi...
متن کاملSpike timing dependent plasticity: mechanisms, significance, and controversies
Long-term modification of synaptic strength is one of the basic mechanisms of memory formation and activity-dependent refinement of neural circuits. This idea was purposed by Hebb to provide a basis for the formation of a cell assembly. Repetitive correlated activity of pre-synaptic and post-synaptic neurons can induce long-lasting synaptic strength modification, the direction and extent of whi...
متن کاملReinforcement Learning with Modulated Spike Timing-Dependent Synaptic Plasticity Running head: Reinforcement Learning with STDP
Spike timing-dependent synaptic plasticity (STDP) has emerged as the preferred framework linking patterns of pre-and postsynaptic activity to changes in synaptic strength. Although synaptic plasticity is widely believed to be a major component of learning, it is unclear how STDP itself could serve as a mechanism for general purpose learning. On the other hand, algorithms for reinforcement learn...
متن کاملOptimal Spike-Timing Dependent Plasticity for Precise Action Potential Firing in Supervised Learing
In timing-based neural codes, neurons have to emit action potentials at precise moments in time. We use a supervised learning paradigm to derive a synaptic update rule that optimizes via gradient ascent the likelihood of postsynaptic firing at one or several desired firing times. We find that the optimal strategy of up and down regulating synaptic efficacies depends on the relative timing betwe...
متن کاملOptimal Spike-Timing-Dependent Plasticity for Precise Action Potential Firing in Supervised Learning
In timing-based neural codes, neurons have to emit action potentials at precise moments in time. We use a supervised learning paradigm to derive a synaptic update rule that optimizes by gradient ascent the likelihood of postsynaptic firing at one or several desired firing times. We find that the optimal strategy of up- and downregulating synaptic efficacies depends on the relative timing betwee...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Neural computation
دوره 19 6 شماره
صفحات -
تاریخ انتشار 2007